Spencer Parr Thesis Analysis

Info

Study Overview:
The primary objective of this study is to assess the impact of Clionid sponges on coral reef ecosystems in the U.S. Virgin Islands. Coral reefs are critical for biodiversity and marine life, but have faced significant declines due to environmental stressors, including climate change, ocean acidification, and disease outbreaks. Clionid sponges, as bioeroders, contribute to coral degradation by breaking down the calcium carbonate structure of corals, accelerating reef decline. By examining the prevalence and interactions of Cliona sponges with coral over time, this research aims to understand how these sponges influence reef health and resilience, particularly in response to environmental disturbances such as bleaching events and hurricanes.

This study utilizes long-term data collected by the Territorial Coral Reef Monitoring Program (TCRMP) from 2009 to 2023 across 34 reef sites in the USVI. The research focuses on three main questions: (1) how Cliona cover and coral interactions have changed over time, (2) how shifts in coral abundance affect Cliona-coral interactions, and (3) whether Cliona presence correlates with coral health, particularly disease prevalence. Through statistical modeling, including generalized linear models (GLMs) and linear mixed-effects models (LMEMs), the study will analyze the relationship between Cliona prevalence and environmental variables such as temperature, water quality, and coral species. This analysis is intended to reveal patterns of Cliona recruitment on stressed or diseased corals and to identify species-specific vulnerabilities, providing insights into the adaptive mechanisms of Cliona and guiding reef conservation strategies in light of changing ocean conditions.

Code Explanation:
The R scripts provided manage and manipulate TCRMP data, preparing it for statistical analysis. Key steps include filtering Cliona species data, calculating prevalence, and integrating environmental factors. Models such as Linear Mixed-Effects Models (LMEMs) and Generalized Linear Models (GLMs) are applied to assess Cliona recruitment patterns, coral interactions, and environmental influences.

Spencer Parr

Metadata

Coral Health Data:
Coral health data is the base structure of the the TCRMP data collection.

# Metadata table as a data frame
metadata <- data.frame(
  Columns = c(
    "Location", "SampleDate", "SampleYear", "SampleMonth", "Period", "SampleType", "Method", 
    "Recorder", "Transect", "SPP", "Size", "Interactions", "Predation", "Notes", "Damage", 
    "BLP", "BLII", "BL, P, VP, SP", "BLType", "BLCause", "CW", "CWLight", "CWDark", 
    "OldMort", "RecMort", "Disease", "NoLesion", "LesionPattern", "LesionEdge"
  ),
  Description = c(
    "Monitoring location at which data was collected",
    "Date on which data was collected",
    "Monitoring period in which data was collected; Note - Sample year may not always match the year from the SampleDate if a location was surveyed late in the annual monitoring period",
    "Month in which data was collected; Note - when a location was not surveyed fully in the same month, the month in which sampling began is used as the sample month",
    "Refers to the sample period in which the data falls; Annual, PeakBL, PostBL, SCTLD, WS, and Juvenile are the periods",
    "Indicates the type of transect (random or permanent) from which data was collected",
    "Indicates the way in which data was collected (e.g., line intercept, belt transect)",
    "Diver who recorded the data",
    "Transect number; Typically 6 transects per site, but additional transects may have been recorded",
    "Coral identified to species; If not identifiable to species, genus is used",
    "Coral colony size; Length (maximum planar diameter), Width (perpendicular to length), and Height (perpendicular to plane of growth)",
    "Refers to other benthic entities interacting with living coral tissue; Includes macroalgae, sponges, cyanobacteria, etc.",
    "Type and number of predators (e.g., damselfish, Coralliophila sp.)",
    "Additional notes about the colony condition",
    "Indicates the type of damage experienced by the colony (e.g., broken, overturned)",
    "Indicates the bleaching status of a colony; Used before percent bleaching and paling were recorded",
    "Indicates a secondary bleaching condition in addition to 'BLP'",
    "Represents the proportion of the colony's tissue affected by bleaching or paling; VP and SP indicate intermediary stages",
    "Refers to the type or location of bleaching on a colony (e.g., Focal, Medial, Granular)",
    "Notes the cause of observed bleaching (e.g., under Lobophora sp.)",
    "Represents the proportion of tissue at the light end of the Coral Watch colony color scale",
    "Lightest colony living tissue color observed (Coral Watch scale: 1-6)",
    "Darkest colony living tissue color observed (Coral Watch scale)",
    "Proportion of the colony skeletal structure considered old partial mortality",
    "Proportion of the colony skeletal structure considered recent partial mortality",
    "Presence of Caribbean coral diseases; Proportion of the colony affected by lesions",
    "Number of lesions present on the colony",
    "Describes the location of rapid tissue loss diseases (e.g., at edges or center of colony)",
    "Describes characteristics of the active lesion margin (e.g., smooth line, irregular pattern)"
  )
)

# Create the table
metadata %>%
  kbl(caption = "Metadata for Benthic Coral Health Data") %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE,
    position = "left"
  )
Metadata for Benthic Coral Health Data
Columns Description
Location Monitoring location at which data was collected
SampleDate Date on which data was collected
SampleYear Monitoring period in which data was collected; Note - Sample year may not always match the year from the SampleDate if a location was surveyed late in the annual monitoring period
SampleMonth Month in which data was collected; Note - when a location was not surveyed fully in the same month, the month in which sampling began is used as the sample month
Period Refers to the sample period in which the data falls; Annual, PeakBL, PostBL, SCTLD, WS, and Juvenile are the periods
SampleType Indicates the type of transect (random or permanent) from which data was collected
Method Indicates the way in which data was collected (e.g., line intercept, belt transect)
Recorder Diver who recorded the data
Transect Transect number; Typically 6 transects per site, but additional transects may have been recorded
SPP Coral identified to species; If not identifiable to species, genus is used
Size Coral colony size; Length (maximum planar diameter), Width (perpendicular to length), and Height (perpendicular to plane of growth)
Interactions Refers to other benthic entities interacting with living coral tissue; Includes macroalgae, sponges, cyanobacteria, etc.
Predation Type and number of predators (e.g., damselfish, Coralliophila sp.)
Notes Additional notes about the colony condition
Damage Indicates the type of damage experienced by the colony (e.g., broken, overturned)
BLP Indicates the bleaching status of a colony; Used before percent bleaching and paling were recorded
BLII Indicates a secondary bleaching condition in addition to ‘BLP’
BL, P, VP, SP Represents the proportion of the colony’s tissue affected by bleaching or paling; VP and SP indicate intermediary stages
BLType Refers to the type or location of bleaching on a colony (e.g., Focal, Medial, Granular)
BLCause Notes the cause of observed bleaching (e.g., under Lobophora sp.)
CW Represents the proportion of tissue at the light end of the Coral Watch colony color scale
CWLight Lightest colony living tissue color observed (Coral Watch scale: 1-6)
CWDark Darkest colony living tissue color observed (Coral Watch scale)
OldMort Proportion of the colony skeletal structure considered old partial mortality
RecMort Proportion of the colony skeletal structure considered recent partial mortality
Disease Presence of Caribbean coral diseases; Proportion of the colony affected by lesions
NoLesion Number of lesions present on the colony
LesionPattern Describes the location of rapid tissue loss diseases (e.g., at edges or center of colony)
LesionEdge Describes characteristics of the active lesion margin (e.g., smooth line, irregular pattern)
# Notes as a data frame
notes <- data.frame(
  ID = 1:11,
  Note = c(
    "Transects are 10m in length with typically 6 per site, but additional transects may be present.",
    "Prior to 2007 only colonies with a max planar diameter greater than 10 cm were assessed.",
    "Prior to 2005, there were lapses in recording some variables.",
    "Coral interactions have been recorded from 2009 onward.",
    "Highlighted data has not been verified due to missing data sheets.",
    "Use of CW, CWLight, and CWDark was phased out in September 2015.",
    "Macroalgae previously identified as 'encrusting Lobophora sp.' was changed to Peyssonnelia sp in October 2015.",
    "Ramicrusta sp was officially added as a macroalgae interaction in 2017.",
    "Stony Coral Tissue Loss Disease (SCTLD) was first observed in the US Virgin Islands in January 2019.",
    "Additional columns (NoLesion, LesionPattern, LesionEdge) were added in February 2020.",
    "During the sample period 'Juvenile', specific transects were assessed for coral recruitment."
  )
)

# Create the notes table
notes %>%
  kbl(caption = "Additional Notes for Coral Health Metadata") %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE,
    position = "left"
  )
Additional Notes for Coral Health Metadata
ID Note
1 Transects are 10m in length with typically 6 per site, but additional transects may be present.
2 Prior to 2007 only colonies with a max planar diameter greater than 10 cm were assessed.
3 Prior to 2005, there were lapses in recording some variables.
4 Coral interactions have been recorded from 2009 onward.
5 Highlighted data has not been verified due to missing data sheets.
6 Use of CW, CWLight, and CWDark was phased out in September 2015.
7 Macroalgae previously identified as ‘encrusting Lobophora sp.’ was changed to Peyssonnelia sp in October 2015.
8 Ramicrusta sp was officially added as a macroalgae interaction in 2017.
9 Stony Coral Tissue Loss Disease (SCTLD) was first observed in the US Virgin Islands in January 2019.
10 Additional columns (NoLesion, LesionPattern, LesionEdge) were added in February 2020.
11 During the sample period ‘Juvenile’, specific transects were assessed for coral recruitment.
site_metadata <- data.frame(
  Location = c(
    "Coral Bay", "Fish Bay", "Meri Shoal", "Black Point", "Botany Bay", "Brewers Bay",
    "Buck Island STT", "Coculus Rock", "College Shoal East", "Flat Cay", "Ginsburg Fringe",
    "Grammanik Tiger FSA", "Hind Bank East FSA", "Magens Bay", "Ruperts Rock", "Savana",
    "Seahorse Cottage Shoal", "South Capella", "South Water", "St James", "Buck Island STX",
    "Buck Island STX Deep", "Cane Bay", "Cane Bay Deep", "Castle", "Eagle Ray", "Great Pond",
    "Jacks Bay", "Kings Corner", "Lang Bank EEMP", "Lang Bank Red Hind FSA", 
    "Mutton Snapper FSA", "Salt River Deep", "Salt River West", "Sprat Hole"
  ),
  Code = c(
    "CRB", "FSB", "MSR", "BPT", "BTY", "BWR", "BIT", "CKR", "CSE", "FLC", "GBF", 
    "GKT", "HBE", "MGN", "RRK", "SVN", "SHR", "SCP", "SWT", "SSJ", "BIX", "BID",
    "CBS", "CBD", "CST", "EGR", "GRP", "JKB", "KGC", "LBP", "LBH", "MTS", "SRD", "SRW", "SPH"
  ),
  Island = c(
    "STJ", "STJ", "STJ", "STT", "STT", "STT", "STT", "STT", "STT", "STT", "STT", 
    "STT", "STT", "STT", "STT", "STT", "STT", "STT", "STT", "STT", "STX", "STX", 
    "STX", "STX", "STX", "STX", "STX", "STX", "STX", "STX", "STX", "STX", "STX", "STX", "STX"
  ),
  Latitude = c(
    18.33797, 18.31417, 18.24447, 18.34450, 18.35738, 18.34403, 18.27883, 18.31257, 18.18568,
    18.31822, 18.18770, 18.18885, 18.20217, 18.37425, 18.327433, 18.34064, 18.29467, 18.26267,
    18.28068, 18.29459, 17.78500, 17.80659, 17.77388, 17.77661, 17.76278, 17.76150, 17.71097,
    17.74337, 17.69116, 17.72145, 17.82427, 17.63660, 17.78523, 17.78530, 17.73400
  ),
  Longitude = c(
    -64.70402, -64.76408, -64.75862, -64.98595, -65.03442, -64.98435, -64.89833, -64.86058, 
    -65.07677, -64.99104, -64.95998, -64.95659, -65.00158, -64.93438, -64.925967, -65.08205, 
    -64.86750, -64.87237, -64.94592, -64.83238, -64.60917, -64.59935, -64.81350, -64.81522,
    -64.59743, -64.69880, -64.65221, -64.57160, -64.90008, -64.54706, -64.44963, -64.86240,
    -64.75917, -64.75940, -64.89540
  ),
  ReefComplex = c(
    "Nearshore", "Nearshore", "Mesophotic", "Nearshore", "Nearshore", "Nearshore",
    "Offshore", "Nearshore", "Mesophotic", "Offshore", "Mesophotic", "Mesophotic", 
    "Mesophotic", "Nearshore", "Nearshore", "Offshore", "Offshore", "Offshore", 
    "Offshore", "Offshore", "Offshore", "Mesophotic", "Nearshore", "Mesophotic", 
    "Offshore", "Offshore", "Nearshore", "Nearshore", "Nearshore", "Mesophotic", 
    "Mesophotic", "Offshore", "Mesophotic", "Nearshore", "Nearshore"
  ),
  Depth = c(
    9, 6, 30, 9, 8, 6, 14, 7, 30, 12, 63, 38, 39, 7, 5, 9, 20, 20, 20, 15, 15, 33, 
    10, 38, 7, 10, 6, 14, 17, 27, 33, 24, 30, 11, 8
  ),
  YearAdded = c(
    2011, 2002, 2005, 2003, 2002, 2002, 2005, 2002, 2003, 2003, 2011, 2003, 2003, 
    2002, 2022, 2003, 2003, 2003, 2005, 2005, 2002, 2017, 2002, 2009, 2003, 2002, 
    2003, 2002, 2007, 2009, 2002, 2003, 2009, 2002, 2002
  )
)

# Create a styled table
site_metadata %>%
  kbl(caption = "Survey Sites Metadata") %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = TRUE,
    position = "left"
  )
Survey Sites Metadata
Location Code Island Latitude Longitude ReefComplex Depth YearAdded
Coral Bay CRB STJ 18.33797 -64.70402 Nearshore 9 2011
Fish Bay FSB STJ 18.31417 -64.76408 Nearshore 6 2002
Meri Shoal MSR STJ 18.24447 -64.75862 Mesophotic 30 2005
Black Point BPT STT 18.34450 -64.98595 Nearshore 9 2003
Botany Bay BTY STT 18.35738 -65.03442 Nearshore 8 2002
Brewers Bay BWR STT 18.34403 -64.98435 Nearshore 6 2002
Buck Island STT BIT STT 18.27883 -64.89833 Offshore 14 2005
Coculus Rock CKR STT 18.31257 -64.86058 Nearshore 7 2002
College Shoal East CSE STT 18.18568 -65.07677 Mesophotic 30 2003
Flat Cay FLC STT 18.31822 -64.99104 Offshore 12 2003
Ginsburg Fringe GBF STT 18.18770 -64.95998 Mesophotic 63 2011
Grammanik Tiger FSA GKT STT 18.18885 -64.95659 Mesophotic 38 2003
Hind Bank East FSA HBE STT 18.20217 -65.00158 Mesophotic 39 2003
Magens Bay MGN STT 18.37425 -64.93438 Nearshore 7 2002
Ruperts Rock RRK STT 18.32743 -64.92597 Nearshore 5 2022
Savana SVN STT 18.34064 -65.08205 Offshore 9 2003
Seahorse Cottage Shoal SHR STT 18.29467 -64.86750 Offshore 20 2003
South Capella SCP STT 18.26267 -64.87237 Offshore 20 2003
South Water SWT STT 18.28068 -64.94592 Offshore 20 2005
St James SSJ STT 18.29459 -64.83238 Offshore 15 2005
Buck Island STX BIX STX 17.78500 -64.60917 Offshore 15 2002
Buck Island STX Deep BID STX 17.80659 -64.59935 Mesophotic 33 2017
Cane Bay CBS STX 17.77388 -64.81350 Nearshore 10 2002
Cane Bay Deep CBD STX 17.77661 -64.81522 Mesophotic 38 2009
Castle CST STX 17.76278 -64.59743 Offshore 7 2003
Eagle Ray EGR STX 17.76150 -64.69880 Offshore 10 2002
Great Pond GRP STX 17.71097 -64.65221 Nearshore 6 2003
Jacks Bay JKB STX 17.74337 -64.57160 Nearshore 14 2002
Kings Corner KGC STX 17.69116 -64.90008 Nearshore 17 2007
Lang Bank EEMP LBP STX 17.72145 -64.54706 Mesophotic 27 2009
Lang Bank Red Hind FSA LBH STX 17.82427 -64.44963 Mesophotic 33 2002
Mutton Snapper FSA MTS STX 17.63660 -64.86240 Offshore 24 2003
Salt River Deep SRD STX 17.78523 -64.75917 Mesophotic 30 2009
Salt River West SRW STX 17.78530 -64.75940 Nearshore 11 2002
Sprat Hole SPH STX 17.73400 -64.89540 Nearshore 8 2002
# Metadata as a data frame
benthic_cover_metadata <- data.frame(
  Columns = c(
    "SampleYear", "SampleMonth", "Period", "Location", "FilmDate",
    "NoPts", "AnalysisBy", "AnalysisDate", "Transect", "BenthicCover", "Notes"
  ),
  Description = c(
    "Monitoring period in which data was collected; Sample year may not always match the year from the FilmDate if a location was surveyed late in the annual monitoring period.",
    "Month in which data was collected; when a location was not surveyed fully in the same month, the month in which sampling began is used.",
    "Refers to the sample period in which the data falls; Periods include Annual, PeakBL, PostBL, SCTLD, and WS.",
    "Monitoring location at which data was collected.",
    "Date on which transects were filmed.",
    "Total number of random assignment points analyzed for the transect.",
    "Individual who performed the benthic cover analysis.",
    "Date on which benthic cover analysis was done.",
    "Transect number; typically 6 transects per site, but additional transects may have been filmed.",
    "All remaining columns refer to individual benthic cover categories. Values reported in 'BenthicData2017-PresRAW' are point totals, while values in 'BenthicData' are calculated benthic cover.",
    "Additional notes pertaining to benthic cover."
  )
)

# Create the table
benthic_cover_metadata %>%
  kbl(caption = "TCRMP Benthic Cover Metadata") %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE,
    position = "left"
  )
TCRMP Benthic Cover Metadata
Columns Description
SampleYear Monitoring period in which data was collected; Sample year may not always match the year from the FilmDate if a location was surveyed late in the annual monitoring period.
SampleMonth Month in which data was collected; when a location was not surveyed fully in the same month, the month in which sampling began is used.
Period Refers to the sample period in which the data falls; Periods include Annual, PeakBL, PostBL, SCTLD, and WS.
Location Monitoring location at which data was collected.
FilmDate Date on which transects were filmed.
NoPts Total number of random assignment points analyzed for the transect.
AnalysisBy Individual who performed the benthic cover analysis.
AnalysisDate Date on which benthic cover analysis was done.
Transect Transect number; typically 6 transects per site, but additional transects may have been filmed.
BenthicCover All remaining columns refer to individual benthic cover categories. Values reported in ‘BenthicData2017-PresRAW’ are point totals, while values in ‘BenthicData’ are calculated benthic cover.
Notes Additional notes pertaining to benthic cover.
# Notes as a data frame
benthic_cover_notes <- data.frame(
  ID = 1:9,
  Note = c(
    "Benthic transects are the same ones on which coral health metrics are recorded. Each is 10m in length, typically 6 per site, though additional transects may have been recorded.",
    "Coral Point Count with Excel Extensions (CPCe) software was used for analysis through 2018. After 2018, random point assignment was done using R Studio.",
    "The number of points per image has varied throughout the monitoring program: 10 (2001-2011), 15 (2012-2013), and 20 (2014-present).",
    "Rhodolith (RO) was added as a benthic cover category in 2011.",
    "Peyssonnelia sp (PEY) was added as a benthic cover category in 2014.",
    "Ramicrusta sp (RAMI) was added as a benthic cover category in 2017.",
    "Data in red indicates values were unable to be verified during QAQC because the original analysis file was missing or corrupted.",
    "Stony Coral Tissue Loss Disease (SCTLD) was added as a coral condition category after its arrival in the US Virgin Islands in 2019.",
    "Cells containing 'ND' indicate that there is no data recorded for that benthic category."
  )
)

# Create the notes table
benthic_cover_notes %>%
  kbl(caption = "Additional Notes for TCRMP Benthic Cover Metadata") %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE,
    position = "left"
  )
Additional Notes for TCRMP Benthic Cover Metadata
ID Note
1 Benthic transects are the same ones on which coral health metrics are recorded. Each is 10m in length, typically 6 per site, though additional transects may have been recorded.
2 Coral Point Count with Excel Extensions (CPCe) software was used for analysis through 2018. After 2018, random point assignment was done using R Studio.
3 The number of points per image has varied throughout the monitoring program: 10 (2001-2011), 15 (2012-2013), and 20 (2014-present).
4 Rhodolith (RO) was added as a benthic cover category in 2011.
5 Peyssonnelia sp (PEY) was added as a benthic cover category in 2014.
6 Ramicrusta sp (RAMI) was added as a benthic cover category in 2017.
7 Data in red indicates values were unable to be verified during QAQC because the original analysis file was missing or corrupted.
8 Stony Coral Tissue Loss Disease (SCTLD) was added as a coral condition category after its arrival in the US Virgin Islands in 2019.
9 Cells containing ‘ND’ indicate that there is no data recorded for that benthic category.

Cliona Prevalence- Coral Health Data

This code explains the data management/manipulation and graphical evidence of Cliona prevalence for each TCRMP location

  1. The data is read into R using read_csv, and then selected columns are extracted, specifically: Period, SampleType, Location, SampleYear, Method, Transect, SPP, Cliona, Spo1, Spo1ID, Spo2ID, and Spo2. This selection reduces the dataset to only the columns relevant to this analysis.

  2. Next, a new column called presence is created to indicate whether there is evidence of Cliona presence on each sample. The mutate function is used along with if_else to assign a value of 1 to presence if any of the following conditions are met: the Cliona value is greater than zero, Spo1ID or Spo2ID columns contain either “CLSP” or “BOSP”. If none of these conditions are met, presence is set to 0. After this, presence values are checked for any missing data, and replace_na is used to fill any NA values with 0, ensuring that presence is either 1 or 0 without any missing values.

  3. The code then applies several filters to refine the dataset. First, it filters for rows where Period is either “Annual” or “PostBL”, SampleType is either “Permanent” or “permanent”, and Method is either “intercept” or “50 cm belt”. This step ensures that only specific types of samples are kept for the analysis. Further, rows where Transect contains the letter “A” are removed, ensuring that only certain transect data is included. Finally, the code filters for rows where SampleYear is 2005 or later, focusing on more recent data.

  4. In the last step, additional transformations are applied. A new ID column is created to provide a unique identifier for each row, generated as a sequence from 1 to the number of rows in the filtered dataset. The SampleYear, Location, and Transect columns are converted to factor data types using as.factor to prepare them for categorical analysis. This finalizes the dataset with all necessary transformations, making it ready for further analysis or modeling.

#1
cliona <- read_csv("../Cliona.csv")%>%
  select(Period,SampleType, Location, SampleYear, Method, Transect, SPP, Cliona, Spo1, Spo1ID, Spo2ID, Spo2)%>%
  #2
  mutate(presence=if_else(Cliona>0 | Spo1ID %in% c("CLSP", "BOSP") | Spo2ID %in% c("CLSP", "BOSP"),1,0))%>%
  mutate(presence=replace_na(presence,0))%>%
  #3
  filter(Period%in% c("Annual","PostBL")&SampleType%in% c("Permanent","permanent")&Method%in% c("intercept","50 cm belt"))%>%
  filter(!grepl("A",Transect))%>%
  filter(SampleYear>=2005)%>%
  #4
  mutate(ID=1:nrow(.),
         SampleYear=as.factor(SampleYear),
         Location=as.factor(Location),
         Transect=as.factor(Transect))
  1. This R code calculates Cliona sponge prevalence across transects by grouping the cliona dataset by Location, SampleYear, and Transect. Within each group, it calculates three metrics: prev (the average presence of Cliona, indicating prevalence), freq (total occurrences of Cliona), and total (total observations in that group). After ungrouping the data, it adds a unique ID to each row. The resulting transectprev dataset provides summarized prevalence data for each unique combination of location, year, and transect.
transectprev <- cliona %>%
  group_by(Location, SampleYear, Transect) %>%
  summarise(prev = mean(presence), freq = sum(presence), total = n())%>%
  ungroup()%>%
  mutate(ID= 1:nrow(.))
  1. This code creates a summary data frame, location_means, showing the average Cliona sponge prevalence for each location. It groups the transectprev data by Location and then calculates the mean of prev (prevalence) within each group, saving the result as mean_prev. The resulting location_means data frame has each location’s name and its corresponding average Cliona prevalence.
location_means <- transectprev %>%
  group_by(Location) %>%
  summarise(mean_prev = mean(prev))
  1. This code generates a plot that visualizes the prevalence of Cliona sponges at different reef sites by creating a boxplot for each location. The ggplot function uses the transectprev dataset, which contains Cliona prevalence data, and arranges the locations on the x-axis in descending order of prevalence. Each boxplot represents the range and distribution of Cliona prevalence values at that location. To enhance the visualization, the code includes a stat_summary layer that adds a green diamond marker at the mean prevalence for each location, making it easy to compare average Cliona prevalence across sites.

    ggplot(transectprev, aes(x = reorder(Location, -prev), y = prev)) +
      geom_boxplot(fill = "lightblue") +
      stat_summary(fun = mean, geom = "point", shape = 18, size = 3, color = "darkgreen") +
      labs(title = "Prevalence of Cliona Sponges by Location (with Mean Prevalence Marked)",
           x = "Location",
           y = "Prevalence") +
      theme_minimal() +
      theme(axis.text.x = element_text(angle = 45, hjust = 1))

This plot is important because it reveals the variability in Cliona prevalence across different reef sites. Locations such as Salt River West, Brewers Bay, and Savana show significantly higher prevalence levels, as indicated by both the wider range and higher position of the boxplots and mean markers, compared to other sites with much lower prevalence. This variability suggests that certain sites may be more susceptible to Cliona sponge colonization, potentially due to environmental conditions or other factors specific to those areas.

  1. This code produces a set of line graphs showing the prevalence of Cliona sponges over time, broken down by transect and location. Each location has its own panel (using facet_wrap), and within each panel, different transects are represented by distinct lines and colors. The x-axis represents the sample year, while the y-axis shows the prevalence of Cliona sponges. The lines connect the prevalence data points over time for each transect, while points mark each specific measurement. The scale_x_continuous function formats the x-axis labels to show each unique year in the dataset, improving clarity.
# Convert SampleYear to numeric, if appropriate
# transectprev$SampleYear <- as.numeric(transectprev$SampleYear)
transectprevnum<-transectprev
transectprevnum$SampleYear <- as.numeric(transectprevnum$SampleYear)
# Then plot with the updated variable
ggplot(transectprevnum, aes(x = SampleYear, y = prev, color = factor(Transect), group = Transect)) +
  geom_line() +
  geom_point() +
  facet_wrap(~Location, scales = "free_y") +
  labs(title = "Prevalence Over Years by Transect",
       x = "Year",
       y = "Prevalence",
       color = "Transect") +
  scale_x_continuous(breaks = unique(transectprevnum$SampleYear), labels = unique(transectprevnum$SampleYear)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5))

This graph is valuable because it allows for temporal analysis of Cliona prevalence trends at multiple scales: across years, transects, and locations. By categegorising between the data by transect and location, the plot can reveal patterns, such as consistent increases or decreases in prevalence, or fluctuations in response to specific events or environmental changes. It could also be used to identify which transects or locations show higher susceptibility to Cliona over time.

Identifying Environmental Event Impact on Cliona Prevalence

  1. Add binary or categorical variables to indicate these events
envvariable <- transectprev %>%
  mutate(Bleaching = if_else(SampleYear %in% c(2010, 2023), 1, 0),
         Hurricane = if_else(SampleYear %in% c(2017, 2019), 1, 0),
         SCTLD = if_else(SampleYear == 2019, 1, 0))

In order to run a linear mixed effects model, i need the average size of corals grouped by site and year. But 2003-2005 does not have any measurements in the width column. meaning i must cut 2003-2005 out of this section of analysis.

clionawidth<-raw_clio
clionawidth <- clionawidth[!is.na(clionawidth$Width), ]
coral_size_summary <- clionawidth %>%
  group_by(Location, SampleYear, Transect) %>%
  summarise(mean_volume = mean(Length * Width * Height, na.rm = TRUE),
            mean_surface_area = mean(2 * (Length * Width + Width * Height + Height * Length), na.rm = TRUE),
            .groups = "drop")
   
coral_size_summary <- coral_size_summary %>%
  filter(!grepl("[A-Za-z]", Transect))

coral_size_summary <- coral_size_summary %>%
  filter(!Transect %in% c("7", "8"))

coral_size_summary <- coral_size_summary %>%
  mutate(SampleYear = as.factor(SampleYear))

envvariable<- envvariable %>%
  mutate(SampleYear = as.factor(SampleYear))

envvariable <- envvariable %>%
  left_join(coral_size_summary, by = c("Location", "SampleYear", "Transect"))
  1. Running a linear mixed-effects model

Accounts for random effects

#You're gonna want to add temperature to the model. but that is gonna take a second
envvariable <- envvariable %>%
  mutate(mean_volume_scaled = scale(mean_volume))


lmem <- lmer(prev ~ Bleaching + Hurricane + SCTLD + mean_volume_scaled +
               (1 | Location) + (1 | SampleYear), 
             data = envvariable)
summary(lmem)
## Linear mixed model fit by REML ['lmerMod']
## Formula: prev ~ Bleaching + Hurricane + SCTLD + mean_volume_scaled + (1 |  
##     Location) + (1 | SampleYear)
##    Data: envvariable
## 
## REML criterion at convergence: -7625.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3206 -0.4851 -0.2305  0.1790  9.9318 
## 
## Random effects:
##  Groups     Name        Variance  Std.Dev.
##  Location   (Intercept) 0.0005761 0.02400 
##  SampleYear (Intercept) 0.0001662 0.01289 
##  Residual               0.0035541 0.05962 
## Number of obs: 2778, groups:  Location, 33; SampleYear, 17
## 
## Fixed effects:
##                     Estimate Std. Error t value
## (Intercept)         0.032443   0.005580   5.814
## Bleaching           0.015383   0.014223   1.082
## Hurricane           0.003857   0.014090   0.274
## SCTLD              -0.013429   0.019217  -0.699
## mean_volume_scaled  0.004516   0.001212   3.727
## 
## Correlation of Fixed Effects:
##             (Intr) Blchng Hurrcn SCTLD 
## Bleaching   -0.171                     
## Hurricane   -0.174  0.068              
## SCTLD        0.000  0.000 -0.683       
## mn_vlm_scld -0.012  0.006  0.011  0.002
# Extract rows used in the model
model_data <- model.frame(lmem)

# Generate predictions
envvariable$predicted <- NA # Initialize a column for predictions
envvariable$predicted[as.numeric(rownames(model_data))] <- predict(lmem)


ggplot(envvariable, aes(x = predicted, y = prev)) +
  geom_point(alpha = 0.5) +
  geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") +
  labs(title = "Predicted vs Observed Cliona Prevalence",
       x = "Predicted Prevalence",
       y = "Observed Prevalence") +
  theme_minimal()

The results of the linear mixed-effects model indicate that coral size, represented by the scaled variable mean_volume_scaled, is a significant predictor of Cliona sponge prevalence. For every one standard deviation increase in coral size, Cliona prevalence increases by approximately 0.0045 units, highlighting the importance of larger corals as suitable habitats for Cliona sponges. In contrast, environmental events such as bleaching, hurricanes, and SCTLD showed no significant effects on Cliona prevalence in this dataset, suggesting these disturbances may not be primary drivers of Cliona colonization patterns.

The random effects for location and year showed minimal variability, indicating that most of the observed trends in Cliona prevalence are explained by the fixed effects, particularly coral size. While the model provides a good fit to the data, the lack of significance for environmental predictors implies that other factors, beyond these disturbances, might better explain spatial and temporal variations in Cliona prevalence.

  1. Running a Generalized Linear Model

Does not include random effects, meaning all variability is attributed to the fixed effects

# Run a Generalized Linear Model (GLM)
glm_model <- glm(prev ~ Bleaching + Hurricane + SCTLD + mean_volume_scaled,
                 data = envvariable, family = quasibinomial)
summary(glm_model)
## 
## Call:
## glm(formula = prev ~ Bleaching + Hurricane + SCTLD + mean_volume_scaled, 
##     family = quasibinomial, data = envvariable)
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -3.32425    0.04043 -82.227  < 2e-16 ***
## Bleaching           0.34027    0.13519   2.517   0.0119 *  
## Hurricane           0.05151    0.14105   0.365   0.7150    
## SCTLD              -0.49597    0.21618  -2.294   0.0219 *  
## mean_volume_scaled  0.09563    0.01871   5.111 3.41e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasibinomial family taken to be 0.1220545)
## 
##     Null deviance: 265.71  on 2777  degrees of freedom
## Residual deviance: 261.60  on 2773  degrees of freedom
##   (42 observations deleted due to missingness)
## AIC: NA
## 
## Number of Fisher Scoring iterations: 6
# Extract rows used in the model
model_data <- model.frame(glm_model)

# Generate predicted probabilities
predicted_values <- predict(glm_model, type = "response")

# Add a predicted column with NA for unmatched rows
envvariable$predicted <- NA
envvariable$predicted[as.numeric(rownames(model_data))] <- predicted_values


# Plot predicted vs observed prevalence
ggplot(envvariable, aes(x = predicted, y = prev)) +
  geom_point(alpha = 0.5) +
  geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") +
  labs(title = "Predicted vs Observed Cliona Prevalence",
       x = "Predicted Prevalence",
       y = "Observed Prevalence") +
  theme_minimal()

# Effect of Bleaching
ggplot(envvariable, aes(x = factor(Bleaching), y = predicted)) +
  geom_boxplot() +
  labs(title = "Effect of Bleaching on Predicted Cliona Prevalence",
       x = "Bleaching Event (0 = No, 1 = Yes)",
       y = "Predicted Prevalence") +
  theme_minimal()

The GLM results indicate that both Bleaching and SCTLD significantly influence Cliona prevalence, while Hurricane does not. Specifically, bleaching events are associated with a significant increase in Cliona prevalence (Estimate=0.34027,𝑝=0.0119), suggesting that bleaching may create favorable conditions for Cliona colonization. Conversely, SCTLD has a significant negative association (Estimate=−0.49597,𝑝=0.0219), implying that disease outbreaks may reduce Cliona prevalence, possibly due to mortality or competitive dynamics. Coral size, represented by the scaled variable mean_volume_scaled, is also highly significant (Estimate=0.09563,𝑝=3.41×10 −7), with larger corals being more likely to host Cliona sponges. The intercept (Estimate=−3.32425, p<2×10−16) reflects the baseline log-odds of Cliona prevalence in the absence of environmental events and at the average coral size. The model explains a small reduction in deviance from the null model (Null Deviance = 265.71 to Residual Deviance = 261.60), suggesting moderate explanatory power. Overall, this analysis highlights coral size as the strongest predictor of Cliona prevalence, with environmental disturbances like bleaching and SCTLD also playing significant but contrasting roles.

ggplot(envvariable, aes(x = mean_volume_scaled)) + geom_histogram(binwidth = 0.5) + labs(title = "Histogram of Coral Volume (Scaled)")
## Warning: Removed 42 rows containing non-finite outside the scale range
## (`stat_bin()`).

ggplot(envvariable, aes(x = prev)) + geom_histogram(binwidth = 0.02) + labs(title = "Histogram of Cliona Prevalence")

Compare the fixed effects (coefficients) for predictors like Bleaching, SCTLD, and mean_volume_scaled in both models to check for consistency

# Extract coefficients for GLM
glm_coef <- coef(summary(glm_model))

# Extract coefficients for LME
lme_coef <- fixef(lmem)

# Print side-by-side comparison
glm_coef
##                       Estimate Std. Error     t value     Pr(>|t|)
## (Intercept)        -3.32424753 0.04042782 -82.2267336 0.000000e+00
## Bleaching           0.34026517 0.13518526   2.5170287 1.189088e-02
## Hurricane           0.05151373 0.14104916   0.3652183 7.149763e-01
## SCTLD              -0.49596527 0.21618193  -2.2942031 2.185325e-02
## mean_volume_scaled  0.09563155 0.01870914   5.1114889 3.413617e-07
lme_coef
##        (Intercept)          Bleaching          Hurricane              SCTLD 
##        0.032443453        0.015383190        0.003856983       -0.013428680 
## mean_volume_scaled 
##        0.004516402

Examine the residuals to ensure the models fit the data

GLM Residuals

plot(glm_model, which = 1)  # Residuals vs. Fitted

LME Residuals

plot(lmem)  # Residuals for LME

qqnorm(resid(lmem))  # Q-Q plot
qqline(resid(lmem))

Spencer Parr

Data Exclusion-Coral Health/Detection Probability

This Code provides reasoning to why excluding coral species with minimal total observations is needed for accurate data analysis

  1. After data manipulation stage (which is fully described in “Cliona Prevalence- Coral Health Data” section) the first step is to create a binary presence column for Cliona. which creates a new column, indicating whether a Cliona sponge was detected for every single coral observation beginning in 2005-2024, this accumulates to 56,605 total observations. Extracts only species (SPP) and Cliona presence (presence).
#Subset into a data frame that only has SPP and Cliona presence 

cliona_subset <- cliona %>% select(SPP, presence)
  1. Next I group the data by coral species (SPP), counts the total number of observations per species (n()), and arranges species in descending order by total observations.
# Count the total observations per coral species
species_counts <- cliona_subset %>%
  group_by(SPP) %>%
  summarise(total_observations = n()) %>%
  arrange(desc(total_observations))
  1. Then I determined the proportion of observations each species represents. Which is the total observations of each coral species divided by the the total of all coral observations (56,605)
# Calculate the proportion of observations per species
species_proportion <- species_counts %>%
  mutate(proportion_observed = total_observations / sum(total_observations))
  1. Next, I count how many times Cliona is present for each species. Logic: presence = 1 (Cliona detected), presence = 0 (No Cliona detected), na.rm = TRUE ensures that missing values don’t affect calculations.
#Count the Total Cliona Occurrences Per Species

cliona_presence_counts <- cliona_subset %>%
  group_by(SPP) %>%
  summarise(cliona_present = sum(presence, na.rm = TRUE))
  1. Next step is to merge species_counts (total observations per species) with cliona_presence_counts (total Cliona occurrences). And calculate the proportion of Cliona presence per species (cliona_present / total_observations).
# Merge with total species observations
species_cliona_prob <- species_counts %>%
  left_join(cliona_presence_counts, by = "SPP") %>%
  mutate(cliona_probability = cliona_present / total_observations)

The Next Steps are on running Power Analysis

The dataset contains observations of Cliona prevalence across multiple coral species. However, some species may have very few observations, which makes statistical inference unreliable. If a species has too few observations, we might miss detecting Cliona even if it is present which is a Type II error (failing to reject a false null hypothesis). If a species has sufficient observations, we can be statistically confident in estimating Cliona prevalence. Thus, power analysis helps determine how many observations are needed per species to ensure reliable estimates of Cliona prevalence.

  1. (Cohen’s h) is a statistical measure of the difference between two proportions. It is calculated using the arcsine transformation, which stabilizes variance when dealing with proportions. The formula used is:

\[ h = 2 \times \left( \asin \left( \sqrt{p_1} \right) - \asin \left( \sqrt{p_2} \right) \right) \]

  1. Next step is to define the smallest Cliona prevalence worth detecting. The variable p1 is set to 1% (0.01) prevalence.
# Define expected prevalence rates
p1 <- 0.01  # Smallest Cliona prevalence worth detecting
  1. Then, we need to calculate the average prevalence of Cliona across all species. The variable p2 is computed as the mean Cliona prevalence across all coral species where Cliona has been observed at least once. The na.rm = TRUE ensures that missing values do not interfere with the calculation.
#the lowest prevalence of cliona that has more than 1 observation for that coral species
p2 <- mean(species_cliona_prob$cliona_probability[species_cliona_prob$cliona_probability > 0], na.rm = TRUE)
  1. Next we define the arcsign coefficient equation an run the power analysis. The function pwr.p.test() is used to determine how many observations per species are needed to detect Cliona with 80% power (power = 0.8) at a 5% significance level (sig.level = 0.05). The absolute value of h_effect_size is used as the effect size in the calculation.
# Calculate Cohen's h effect size
h_effect_size <- 2 * asin(sqrt(p1)) - 2 * asin(sqrt(p2))

# Run power analysis to find the required sample size per species
power_result <- pwr.p.test(h = abs(h_effect_size), sig.level = 0.05, power = 0.8)

The result of the power analysis suggests that at least 157 observations per species are needed to make a statistically confident estimate of Cliona prevalence.

print(power_result$n)
## [1] 350.2433
  1. Once we got a result, we need to set a threshold to filter species with insufficient observations. The threshold for filtering species is determined using ceiling(power_result$n), which rounds up the required sample size to the nearest whole number.
# Define threshold based on power analysis
min_observations_threshold <- ceiling(power_result$n)
  1. Once a minimum threshold is established we can then remove species that do not meet the required sample size. This makes it so that species with very few observations do not introduce noise into the analysis.
# Filter out species with too few observations
filtered_species <- species_cliona_prob %>%
  filter(total_observations >= ceiling(power_result$n))
  1. Next is to load in site depth data from an external file and filter the cliona dataset so that only the species that passed the previous filtering step (those with a sufficient number of observations) are kept. Then summarize Cliona prevalence by location and species. And finally merge the depth and cliona data set.
# Load site depth data (Assuming depth data is in "SiteMetadata.csv")
site_metadata <- read_csv("TCRMP_SiteMetadata.csv") %>%
  select(Location, Depth)  

# Filter Cliona data to keep only chosen species
cliona_filtered <- cliona %>%
  filter(SPP %in% filtered_species$SPP)

# Summarize by **Location AND Coral Species (SPP)**
location_species_summary <- cliona_filtered %>%
  group_by(Location, SPP) %>%
  summarise(
    Total_Abundance = n(),  
    Cliona_Presence = sum(presence, na.rm = TRUE), 
    Total_Surveys = n(), 
    Cliona_Prevalence = Cliona_Presence / Total_Surveys  
  ) %>%
  ungroup()

# Merge Depth Data (Ensuring Each Location Retains Its Depth)
location_species_summary <- location_species_summary %>%
  left_join(site_metadata, by = "Location")  
  1. the next step is to visualize this data. I used two different methods to visualize the if depth has an impact on Cliona prevalence.
ggplot(location_species_summary, aes(x = Depth, y = Cliona_Prevalence, color = SPP)) +
  geom_point(size = 3, alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, linetype = "dashed") +
  labs(title = "Cliona Prevalence vs. Depth by Coral Species",
       x = "Depth (m)",
       y = "Cliona Prevalence",
       color = "Coral Species") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Species Specific at Nearshore DEEP STX & Offshore DEEP STT/STJ

# Load site metadata (Depth category and region info)
site_metadata <- read_csv("TCRMP_SiteMetadata.csv") %>%
  select(Location, Depth, ReefComplex, Island)  # Include ReefComplex and Island

# Merge Cliona data with site metadata
cliona_filtered <- location_species_summary %>%
  left_join(site_metadata, by = "Location")

# **Split Data by Region: STT/STJ vs. STX**
cliona_filtered <- cliona_filtered %>%
  mutate(ReefComplex_Region = case_when(
    Island == "STT" | Island == "STJ" ~ paste0("STT/STJ - ", ReefComplex),
    Island == "STX" ~ paste0("STX - ", ReefComplex),
    TRUE ~ "Other"
  ))

Compare Cliona Prevalence at Nearshore Deep vs. Offshore Deep Sites

deep_sites <- cliona_filtered %>%
  filter(ReefComplex %in% c("Nearshore", "Offshore") & Depth.x >= 15)  # Adjust depth threshol


# Plot Cliona prevalence by ReefComplex
ggplot(deep_sites, aes(x = ReefComplex, y = Cliona_Prevalence, fill = ReefComplex)) +
  geom_boxplot(alpha = 0.6) +
  facet_wrap(~Island) +  # Separate by region (St. Thomas/St. John vs. St. Croix)
  labs(title = "Cliona Prevalence in Nearshore vs. Offshore Deep Sites",
       x = "Reef Complex",
       y = "Cliona Prevalence") +
  theme_minimal()

Plot Cliona Prevalence by Reef Complex (STT/STJ vs. STX) and Coral Species

ggplot(deep_sites, aes(x = ReefComplex_Region, y = Cliona_Prevalence, fill = ReefComplex_Region)) +
  geom_boxplot(alpha = 0.6) +
  facet_wrap(~SPP, scales = "free_y") +  # Separate by coral species
  labs(title = "Cliona Prevalence by Reef Complex and Coral Species",
       x = "Reef Complex (Region)",
       y = "Cliona Prevalence",
       fill = "Reef Complex (Region)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Check the dataset
head(deep_sites)
## # A tibble: 6 × 11
##   Location SPP   Total_Abundance Cliona_Presence Total_Surveys Cliona_Prevalence
##   <chr>    <chr>           <int>           <dbl>         <int>             <dbl>
## 1 Buck Is… AA                282               0           282               0  
## 2 Buck Is… AG                 13               0            13               0  
## 3 Buck Is… AGSP                9               0             9               0  
## 4 Buck Is… AH                 30               0            30               0  
## 5 Buck Is… AL                 10               1            10               0.1
## 6 Buck Is… MC                 32               0            32               0  
## # ℹ 5 more variables: Depth.x <dbl>, Depth.y <dbl>, ReefComplex <chr>,
## #   Island <chr>, ReefComplex_Region <chr>

ANOVA + Tukey HSD

ANOVA: Does prevalence differ by reef complex region? F(2, 120) = 10.45, p < 0.001 #There is a statistically significant difference in Cliona prevalence between at least two of the reef complex region groups.

# ANOVA: Does prevalence differ by reef complex region?
anova_result <- aov(Cliona_Prevalence ~ ReefComplex_Region, data = deep_sites)
summary(anova_result)
##                     Df  Sum Sq  Mean Sq F value   Pr(>F)    
## ReefComplex_Region   2 0.02478 0.012389   10.45 6.54e-05 ***
## Residuals          120 0.14225 0.001185                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Post-hoc test: Tukey HSD
tukey_result <- TukeyHSD(anova_result)
print(tukey_result)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Cliona_Prevalence ~ ReefComplex_Region, data = deep_sites)
## 
## $ReefComplex_Region
##                                            diff         lwr         upr
## STX - Nearshore-STT/STJ - Offshore  0.041878078  0.01981597  0.06394019
## STX - Offshore-STT/STJ - Offshore   0.002714257 -0.01416095  0.01958946
## STX - Offshore-STX - Nearshore     -0.039163820 -0.06331857 -0.01500907
##                                        p adj
## STX - Nearshore-STT/STJ - Offshore 0.0000460
## STX - Offshore-STT/STJ - Offshore  0.9228867
## STX - Offshore-STX - Nearshore     0.0005610

STX Nearshore sites have significantly higher Cliona prevalence compared to both STX Offshore and STT/STJ Offshore reefs.There is no significant difference between STX Offshore and STT/STJ Offshore, suggesting the offshore zones across islands are more similar in Cliona prevalence.This supports a depth × reef zone interaction hypothesis — perhaps driven by environmental gradients (e.g., light, nutrients, stress) unique to STX nearshore reefs.

Plot Anova

# Load necessary libraries
pacman::p_load(ggplot2, dplyr, multcompView, agricolae, tidyr)

# Run ANOVA
aov_result <- aov(Cliona_Prevalence ~ ReefComplex_Region, data = deep_sites)

# Run Tukey HSD and assign group letters
hsd_letters <- HSD.test(aov_result, "ReefComplex_Region", group = TRUE)

# View the groups with assigned letters
print(hsd_letters$groups)
##                    Cliona_Prevalence groups
## STX - Nearshore           0.05300301      a
## STX - Offshore            0.01383919      b
## STT/STJ - Offshore        0.01112493      b
# Calculate means and standard errors for each group
means <- deep_sites %>%
  group_by(ReefComplex_Region) %>%
  summarise(
    mean = mean(Cliona_Prevalence, na.rm = TRUE),
    se = sd(Cliona_Prevalence, na.rm = TRUE) / sqrt(n())
  ) %>%
  # Merge HSD group letters
  left_join(
    hsd_letters$groups %>% rownames_to_column("ReefComplex_Region"),
    by = "ReefComplex_Region"
  )
ggplot(means, aes(x = ReefComplex_Region, y = mean, fill = ReefComplex_Region)) + geom_bar(stat = "identity", width = 0.6, alpha = 0.7, color = "black") + geom_errorbar(aes(ymin = mean - se, ymax = mean + se), width = 0.2) + geom_text(aes(label = groups, y = mean + se + 0.01), vjust = 0, size = 5) + labs(title = "Cliona Prevalence by Reef Complex Region", x = "Reef Complex Region", y = "Mean Cliona Prevalence") + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(size = 16, face = "bold"))

GLM with Interaction Term: Depth * ReefComplex

glm_interaction <- glm(Cliona_Prevalence ~ Depth.x * ReefComplex_Region, family = quasibinomial, data = cliona_filtered)

summary(glm_interaction)
## 
## Call:
## glm(formula = Cliona_Prevalence ~ Depth.x * ReefComplex_Region, 
##     family = quasibinomial, data = cliona_filtered)
## 
## Coefficients:
##                                                Estimate Std. Error t value
## (Intercept)                                   -7.419731   3.123971  -2.375
## Depth.x                                        0.079950   0.084636   0.945
## ReefComplex_RegionSTT/STJ - Nearshore          2.689852   3.323834   0.809
## ReefComplex_RegionSTT/STJ - Offshore           5.474276   3.219854   1.700
## ReefComplex_RegionSTX - Mesophotic            -2.824363   4.852053  -0.582
## ReefComplex_RegionSTX - Nearshore              3.243838   3.183047   1.019
## ReefComplex_RegionSTX - Offshore               5.688702   3.161348   1.799
## Depth.x:ReefComplex_RegionSTT/STJ - Nearshore  0.092521   0.168455   0.549
## Depth.x:ReefComplex_RegionSTT/STJ - Offshore  -0.218290   0.101113  -2.159
## Depth.x:ReefComplex_RegionSTX - Mesophotic     0.087771   0.137030   0.641
## Depth.x:ReefComplex_RegionSTX - Nearshore     -0.004156   0.097509  -0.043
## Depth.x:ReefComplex_RegionSTX - Offshore      -0.191551   0.094033  -2.037
##                                               Pr(>|t|)  
## (Intercept)                                     0.0179 *
## Depth.x                                         0.3453  
## ReefComplex_RegionSTT/STJ - Nearshore           0.4187  
## ReefComplex_RegionSTT/STJ - Offshore            0.0897 .
## ReefComplex_RegionSTX - Mesophotic              0.5607  
## ReefComplex_RegionSTX - Nearshore               0.3086  
## ReefComplex_RegionSTX - Offshore                0.0725 .
## Depth.x:ReefComplex_RegionSTT/STJ - Nearshore   0.5831  
## Depth.x:ReefComplex_RegionSTT/STJ - Offshore    0.0313 *
## Depth.x:ReefComplex_RegionSTX - Mesophotic      0.5221  
## Depth.x:ReefComplex_RegionSTX - Nearshore       0.9660  
## Depth.x:ReefComplex_RegionSTX - Offshore        0.0421 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasibinomial family taken to be 0.1060844)
## 
##     Null deviance: 41.230  on 559  degrees of freedom
## Residual deviance: 35.309  on 548  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 8

In offshore reefs (STT/STJ and STX), Cliona prevalence decreases significantly with depth, as shown by the negative and significant interaction terms.The effect of depth is not significant, meaning Cliona may be more uniformly distributed across depths in those zones. This supports the hypothesis that reef type moderates the impact of depth on Cliona.

pacman::p_load(emmeans, broom)

# Fit interaction model (already done)

glm_interaction <- glm(Cliona_Prevalence ~ Depth.x * ReefComplex_Region, family = quasibinomial, data = cliona_filtered) 

Compare with simpler model

glm_simple <- glm(Cliona_Prevalence ~ Depth.x + ReefComplex_Region, family = quasibinomial, data = cliona_filtered)

anova(glm_simple, glm_interaction, test = "Chisq") # Likelihood ratio test
## Analysis of Deviance Table
## 
## Model 1: Cliona_Prevalence ~ Depth.x + ReefComplex_Region
## Model 2: Cliona_Prevalence ~ Depth.x * ReefComplex_Region
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)    
## 1       553     37.514                         
## 2       548     35.309  5   2.2051 0.000889 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model with the Depth.x * ReefComplex_Region interaction is significantly better than the simpler additive model. This confirms that the effect of depth depends on the reef zone/region, and you were right to include that interaction term.

Estimated marginal means

emm <- emmeans(glm_interaction, ~ ReefComplex_Region | Depth.x, at = list(Depth.x = c(5, 15, 30)))


summary(emm)
## Depth.x =  5:
##  ReefComplex_Region   emmean    SE  df asymp.LCL asymp.UCL
##  STT/STJ - Mesophotic -7.020 2.700 Inf    -12.32   -1.7187
##  STT/STJ - Nearshore  -3.868 0.429 Inf     -4.71   -3.0267
##  STT/STJ - Offshore   -2.637 0.520 Inf     -3.66   -1.6180
##  STX - Mesophotic     -9.405 3.180 Inf    -15.63   -3.1785
##  STX - Nearshore      -3.797 0.385 Inf     -4.55   -3.0425
##  STX - Offshore       -2.289 0.308 Inf     -2.89   -1.6846
## 
## Depth.x = 15:
##  ReefComplex_Region   emmean    SE  df asymp.LCL asymp.UCL
##  STT/STJ - Mesophotic -6.220 1.870 Inf     -9.89   -2.5520
##  STT/STJ - Nearshore  -2.143 1.080 Inf     -4.25   -0.0347
##  STT/STJ - Offshore   -4.021 0.232 Inf     -4.48   -3.5659
##  STX - Mesophotic     -7.728 2.110 Inf    -11.87   -3.5904
##  STX - Nearshore      -3.039 0.226 Inf     -3.48   -2.5964
##  STX - Offshore       -3.405 0.259 Inf     -3.91   -2.8965
## 
## Depth.x = 30:
##  ReefComplex_Region   emmean    SE  df asymp.LCL asymp.UCL
##  STT/STJ - Mesophotic -5.021 0.686 Inf     -6.37   -3.6762
##  STT/STJ - Nearshore   0.444 3.250 Inf     -5.93    6.8168
##  STT/STJ - Offshore   -6.096 0.936 Inf     -7.93   -4.2605
##  STX - Mesophotic     -5.212 0.598 Inf     -6.38   -4.0406
##  STX - Nearshore      -1.902 0.886 Inf     -3.64   -0.1661
##  STX - Offshore       -5.079 0.809 Inf     -6.67   -3.4925
## 
## Results are given on the logit (not the response) scale. 
## Confidence level used: 0.95
emmip(glm_interaction, ReefComplex_Region ~ Depth.x, cov.reduce = range, CIs = TRUE) + labs(title = "Estimated Cliona Prevalence by Reef Complex Region", x = "Depth (m)", y = "Estimated Cliona Prevalence") + theme_minimal()

All values are on the logit scale (log-odds of Cliona prevalence). As depth increases, Cliona prevalence (logit) drops sharply in most reef complexes, particularly STT/STJ - Offshore and STX - Offshore.Some groups (like STT/STJ - Nearshore at 30 m) show very wide CIs, suggesting high uncertainty (likely due to sparse data there).

back-transform the logits to prevalence

Back-transforming logits to probability scale

emm_df <- as.data.frame(emm)
emm_df$probability <- plogis(emm_df$emmean)

Plot

ggplot(emm_df, aes(x = as.factor(Depth.x), y = probability, fill = ReefComplex_Region)) + geom_col(position = position_dodge(width = 0.8)) + geom_errorbar(aes(ymin = plogis(asymp.LCL), ymax = plogis(asymp.UCL)), width = 0.2, position = position_dodge(width = 0.8)) + labs(title = "Estimated Cliona Prevalence by Depth and Reef Complex Region", x = "Depth (m)", y = "Estimated Prevalence") + theme_minimal() + theme(axis.text.x = element_text(size = 12), plot.title = element_text(hjust = 0.5, face = "bold"))

This plot visualizes estimated Cliona prevalence across different depths (5, 15, and 30 meters) and reef complex regions, showing how sponge prevalence varies with both reef zone and geographic region (STT/STJ vs. STX) Cliona prevalence is generally highest in shallow offshore sites, particularly at 5–15 m. Mesophotic and some nearshore zones have high uncertainty—indicating a need for more targeted sampling in those regions.

Cliona Prevalence by Reef Complex

This code shows the process of cleaning, graphing, and analyzing the coral health data in order to find if there is any significant difference in Cliona prevalence between Nearshore, Offshore, and Mesophotic sites.

  1. revert back to Coral Prevalence- Coral Health Data tab, steps 1-5 are identical

    #1
    cliona <- read_csv("../Cliona.csv")%>%
      select(Period,SampleType, Location, SampleYear, Method, Transect, SPP, Cliona, Spo1, Spo1ID, Spo2ID, Spo2)%>%
      #2
      mutate(presence=if_else(Cliona>0 | Spo1ID %in% c("CLSP", "BOSP") | Spo2ID %in% c("CLSP", "BOSP"),1,0))%>%
      mutate(presence=replace_na(presence,0))%>%
      #3
      filter(Period%in% c("Annual","PostBL")&SampleType%in% c("Permanent","permanent")&Method%in% c("intercept","50 cm belt"))%>%
      filter(!grepl("A",Transect))%>%
      filter(SampleYear>=2005)%>%
      #4
      mutate(ID=1:nrow(.),
             SampleYear=as.factor(SampleYear),
             Location=as.factor(Location),
             Transect=as.factor(Transect))
    #5
    transectprev <- cliona %>%
      group_by(Location, SampleYear, Transect) %>%
      summarise(prev = mean(presence), freq = sum(presence), total = n())%>%
      ungroup()%>%
      mutate(ID= 1:nrow(.))
  2. Read in the TCRMP site metadata csv. This data frame has 8 columns: Location (e.g., Coral Bay), Code (e.g., CRB), Island (e.g., STJ), Latitude (e.g., 18.338), Longitude (e.g., -64.704), ReefComplex (e.g., Nearshore), Depth (e.g., 9), and YearAdded (e.g., 2011)

    reef_complex <- read_csv("../SiteMetadata.csv")
  3. Next I merge the transectprev data frame with the reef_complex data frame, adding reef complex information (e.g., Nearshore, Offshore, Mesophotic) to the corresponding locations in transectprev based on the Location column.

    Reef_Types <- transectprev %>%
      left_join(reef_complex, by = "Location")
  4. This code groups the Reef_Types data frame by Location and ReefComplex, calculates the mean prevalence (prev) for each group while ignoring missing values, and then converts ReefComplex into a factor with the levels ordered as “Nearshore,” “Offshore,” and “Mesophotic.”

    reef_sum <- Reef_Types %>%
      group_by(Location, ReefComplex) %>%
      summarise(
        mean_prevalence = mean(prev, na.rm = TRUE)  # Calculate mean prevalence
      )%>%
      mutate( ReefComplex = factor(ReefComplex, levels = c("Nearshore", "Offshore", "Mesophotic")))
  5. This code generates a boxplot to visualize the mean prevalence of Cliona sponges across different reef complexes using the reef_sum dataset. It maps ReefComplex to the x-axis, mean_prevalence to the y-axis, and uses colors to distinguish reef complexes.

    ggplot(reef_sum, aes(x = ReefComplex, y = mean_prevalence, fill = ReefComplex)) +
      geom_boxplot(outlier.color = "red", outlier.shape = 16) +
      geom_jitter(width = 0.2, size = 2, alpha = 0.6) +  # Add jitter for data points
      labs(
        title = "Cliona Sponge Prevalence Across Reef Complexes",
        x = "Reef Complex",
        y = "Mean Prevalence"
      ) +
      theme_minimal() +
      theme(
        legend.position = "none",
        axis.text.x = element_text(angle = 45, hjust = 1)
      )

  6. Next we run a one-way Analysis of Variance (ANOVA) to test whether there is a statistically significant difference in mean Cliona sponge prevalence across different ReefComplex categories (Nearshore, Offshore, and Mesophotic).

    # Run ANOVA
    anova_result <- aov(mean_prevalence ~ ReefComplex, data = reef_sum)

    The ANOVA results indicate a statistically significant difference in the mean prevalence of Cliona sponge coverage across the three reef complexes (Nearshore, Offshore, and Mesophotic). The ReefComplex factor accounts for a Sum of Squares (Sum Sq) of 0.005884, while the residual variation (unexplained variance) accounts for 0.013323. The Mean Square (Mean Sq) for ReefComplex is 0.0029420, and the F-statistic is 6.625, with an associated p-value (Pr(>F)) of 0.00414. This p-value is below the conventional threshold of 0.05, suggesting that the differences in Cliona prevalence among the reef complexes are statistically significant.

    # Create a ANOVA table
    broom::tidy(anova_result) %>%
      gt() %>%
      tab_header(
        title = "ANOVA Results for Cliona Sponge Prevalence Across Reef Complexes"
      )
    ANOVA Results for Cliona Sponge Prevalence Across Reef Complexes
    term df sumsq meansq statistic p.value
    ReefComplex 2 0.005876101 0.002938051 6.605047 0.004197984
    Residuals 30 0.013344571 0.000444819 NA NA
  7. Next step is the TukeyHSD (Tukey’s Honest Significant Differences) test which is a post-hoc analysis used to identify which specific group pairs have significant differences in their means. It adjusts for multiple comparisons to control the family-wise error rate, ensuring that conclusions about group differences are not overstated.

    tukey_result <- TukeyHSD(anova_result)

    The TukeyHSD test shows a significant difference in Cliona sponge prevalence between Nearshore and Mesophotic reefs, with Nearshore having a higher mean prevalence (difference = 0.0331, p = 0.003). No significant differences were found between Offshore and Mesophotic reefs (p = 0.207) or Offshore and Nearshore reefs (p = 0.151). These results indicate that Nearshore reefs have notably higher prevalence compared to Mesophotic reefs, while other comparisons show no meaningful differences.

    # Create a TukeyHSD table
    broom::tidy(tukey_result) %>%
      gt() %>%
      tab_header(
        title = "TukeyHSD Results for Cliona Sponge Prevalence Across Reef Complexes"
      )
    TukeyHSD Results for Cliona Sponge Prevalence Across Reef Complexes
    term contrast null.value estimate conf.low conf.high adj.p.value
    ReefComplex Offshore-Nearshore 0 -0.01635689 -0.03765759 0.004943818 0.158150374
    ReefComplex Mesophotic-Nearshore 0 -0.03307402 -0.05562030 -0.010527744 0.003020255
    ReefComplex Mesophotic-Offshore 0 -0.01671713 -0.04008687 0.006652605 0.199058088

    We still need to check for normality of residuals and homogeneity of variances. For normality a Shapiro-Wilk statistic will be ran, and for homogeneity of variances an Levene’s test will be ran.

  8. The Shapiro-Wilk test produced a W statistic of 0.965 and a p-value of 0.3555. Since the p-value is greater than the conventional threshold of 0.05, we fail to reject the null hypothesis that the residuals are normally distributed. This indicates that the assumption of normality for the ANOVA model’s residuals is satisfied, supporting the validity of the ANOVA results.

    shapiro<-shapiro.test(residuals(anova_result))
    
    # Create a shapiro table
    broom::tidy(shapiro) %>%
      gt() %>%
      tab_header(
        title = "Shapiro-Wilk Results for Cliona Sponge Prevalence Across Reef Complexes"
      )
    Shapiro-Wilk Results for Cliona Sponge Prevalence Across Reef Complexes
    statistic p.value method
    0.9635491 0.3246365 Shapiro-Wilk normality test
  9. The Q-Q plot shows that the residuals align closely with the red reference line, indicating that they are approximately normally distributed. The histogram demonstrates that the residuals are symmetrically distributed around zero. And the density plot compares the residuals’ distribution (blue line) to a theoretical normal curve (red dashed line), showing strong alignment with minor deviations. These plots confirm that the residuals satisfy the normality assumption required for ANOVA, supporting the validity of the statistical analysis.

    # Generate individual plots
    qq_plot <- ggplot(data.frame(residuals = residuals(anova_result)), aes(sample = residuals)) +
      stat_qq() + stat_qq_line(color = "red") + 
      labs(title = "Q-Q Plot of Residuals")
    
    hist_plot <- ggplot(data.frame(residuals = residuals(anova_result)), aes(x = residuals)) +
      geom_histogram(binwidth = 0.005, fill = "lightblue", color = "black") + 
      labs(title = "Histogram of Residuals", x = "Residuals", y = "Frequency") +
      theme_minimal()
    
    density_plot <- ggplot(data.frame(residuals = residuals(anova_result)), aes(x = residuals)) +
      geom_density(color = "blue", fill = "lightblue", alpha = 0.5) +
      stat_function(fun = dnorm, args = list(mean = mean(residuals(anova_result)), 
                                             sd = sd(residuals(anova_result))),
                    color = "red", linetype = "dashed") +
      labs(title = "Density Plot with Normal Curve", x = "Residuals", y = "Density")
    
    # Arrange plots
    grid.arrange(qq_plot, hist_plot, density_plot, nrow = 1)

  10. The Levene’s test results show an F-value of 2.3661 with a p-value of 0.1111. Since the p-value is greater than the threshold of 0.05, we fail to reject the null hypothesis that the group variances are equal. This indicates that the assumption of homogeneity of variance is satisfied, supporting the appropriateness of using ANOVA for this data.

    levene<-leveneTest(mean_prevalence ~ ReefComplex, data = reef_sum)
    
    # Create a Levene table
    broom::tidy(levene) %>%
      gt() %>%
      tab_header(
        title = "Levene's Test Results for Cliona Sponge Prevalence Across Reef Complexes"
      )
    Levene's Test Results for Cliona Sponge Prevalence Across Reef Complexes
    statistic p.value df df.residual
    2.550193 0.09487255 2 30
  11. Finally, we need to visualize the original mean prevalence between reef complexes but now with TukeyHSD multcompletters. First, the TukeyHSD function is applied to the ANOVA model to compute pairwise comparisons of means for the “ReefComplex” variable, identifying which groups differ significantly. The multcompView::multcompLetters function then assigns grouping letters (e.g., “a”, “b”) based on adjusted p-values, indicating which groups are statistically similar or different. These labels are converted into a data frame, with the “ReefComplex” levels and their respective group letters for easy handling. Finally, the left_join function integrates the Tukey test results (group letters) into the reef_sum dataset.

    ####visualise the resutls#######
    
    # Extract Tukey test results
    tukey_result <- TukeyHSD(anova_result, "ReefComplex")
    
    # Convert Tukey results to group letters
    library(multcompView)
    tukey_labels <- multcompLetters(tukey_result$ReefComplex[, "p adj"])$Letters
    
    # Convert to a data frame for merging
    tukey_labels <- data.frame(
      ReefComplex = names(tukey_labels),
      group = tukey_labels
    )
    
    # Merge tukey_labels with your reef_sum data
    reef_sum <- reef_sum %>%
      left_join(tukey_labels, by = "ReefComplex")
  12. This code generates a boxplot comparing mean Cliona sponge prevalence across three reef complexes: Mesophotic, Nearshore, and Offshore, with statistical group labels from Tukey’s post-hoc test. The boxplot shows the median prevalence (horizontal line), spread (interquartile range and whiskers), and outliers (red dots). Statistical group labels (“a”, “b”, “ab”) indicate significant differences: Mesophotic reefs have significantly lower prevalence than Nearshore (p = 0.003), while Offshore overlaps with both. Nearshore reefs have the highest prevalence and variability, Mesophotic the lowest with narrow spread, and Offshore shows intermediate prevalence. This visualization highlights depth-related differences in sponge prevalence, suggesting ecological patterns tied to reef complex characteristics.

    # Plot with Tukey labels
    ggplot(reef_sum, aes(x = ReefComplex, y = mean_prevalence, fill = ReefComplex)) +
      geom_boxplot(outlier.color = "red", outlier.shape = 16) +
      geom_jitter(width = 0.2, size = 2, alpha = 0.6) +
      geom_text(
        aes(label = group, x = ReefComplex, y = max(mean_prevalence) + 0.01),
        data = reef_sum, inherit.aes = FALSE, color = "black", size = 4, vjust = -0.5
      ) +
      labs(
        title = "Cliona Sponge Prevalence Across Reef Complexes",
        x = "Reef Complex",
        y = "Mean Prevalence"
      ) +
      theme_minimal() +
      theme(
        legend.position = "none",
        axis.text.x = element_text(angle = 45, hjust = 1)
      )

Cliona Extent

This code explains the data management/manipulation and graphical evidence of Cliona extent for each TCRMP location

  1. The data is read into R using read_csv, and then selected columns are extracted, specifically: Period, SampleType, Location, SampleYear, Method, Transect, SPP, Cliona, Spo1, Spo1ID, Spo2ID, and Spo2. This selection reduces the dataset to only the columns relevant to this analysis.

  2. the code generates two new columns, CLSP and BOSP, which indicate coverage values for two types of sponges. Using the mutate and case_when functions, it examines the columns Spo1ID and Spo2ID to populate these new columns. If Spo1ID contains “CLSP”, then the value of Spo1 is assigned to the CLSP column. Similarly, if Spo2ID contains “CLSP”, then the value from Spo2 is assigned to CLSP. The same process is used to populate the BOSP column, using “BOSP” as the indicator. If neither condition is met in either case, the value is set to NA.

  3. The code then filters the data based on several criteria. First, it keeps only rows where Period is either “Annual” or “PostBL” and SampleType is either “Permanent” or “permanent”, ensuring consistency in sample type naming. Also, only rows where Method is either “intercept” or “50 cm belt” are retained, narrowing the dataset to specific sampling methods. And rows with certain letters in the Transect column (“A”, “R”, or “L”) are excluded using a filtering condition with grepl, and only data from 2005 onward (SampleYear >= 2005) is kept.

  4. Finally, the data is reshaped from a wide format to a long format using pivot_longer. This transformation takes the columns CLSP, BOSP, and Cliona and combines them into two new columns: SpongeType and Extent. The SpongeType column identifies which type of sponge coverage is being recorded (either CLSP, BOSP, or Cliona), and Extent contains the corresponding value of coverage for each type.

#1. 
clionaext <- read_csv("../Cliona.csv")%>%
  select(Period,SampleType, Location, SampleYear, Method, Transect, SPP, Cliona, Spo1, Spo1ID, Spo2ID, Spo2)%>%
  
#2.
  mutate( CLSP = case_when( Spo1ID == "CLSP" ~ Spo1, Spo2ID == "CLSP" ~ Spo2, TRUE ~ NA_real_),
    BOSP = case_when( Spo1ID == "BOSP" ~ Spo1, Spo2ID == "BOSP" ~ Spo2,
      TRUE ~ NA_real_)) %>%
  select(-Spo1, -Spo1ID, -Spo2, -Spo2ID)%>%
  
#3.
  filter(Period%in% c("Annual","PostBL")&SampleType%in% c("Permanent","permanent")&Method%in% c("intercept","50 cm belt"))%>%
  filter(!grepl("A", "R", "L",Transect))%>%
  filter(SampleYear>=2005)%>%
  mutate(ID=1:nrow(.),
         SampleYear=as.factor(SampleYear),
         Location=as.factor(Location),
         Transect=as.factor(Transect))%>%
  
#4. 
  pivot_longer(cols = c(CLSP, BOSP, Cliona), 
               names_to = "SpongeType", 
               values_to = "Extent")

For the first analysis, I ask the question: Has the extent of Cliona sponge coverage increased or decreased over time? Is there a pattern in prevalence by year?

  1. This code assigns the clionaext data frame to a new data frame named ext_time. It then updates ext_time by converting the SampleYear column from a factor to a numeric data type, ensuring that the year values are treated as continuous numbers rather than categorical labels.

    ext_time<-clionaext
    
    ext_time <- ext_time %>%
      mutate(SampleYear = as.numeric(as.character(SampleYear)))
  2. This code processes the clionaext data frame to calculate the total number of unique corals recorded in each year. First, it groups the data by SampleYear and uses n_distinct(ID) to count distinct ID values, representing unique coral observations. The resulting data frame, unique_coral_counts, contains SampleYear and total_unique_corals.

    Next, the SampleYear column is converted from a factor to numeric using mutate. This ensures the year values are treated as continuous numbers for analyses or visualizations that require numerical formatting.

unique_coral_counts <- clionaext %>%
  group_by(SampleYear) %>%
  summarise(total_unique_corals = n_distinct(ID)) 

unique_coral_counts <- unique_coral_counts%>%
  mutate(SampleYear = as.numeric(as.character(SampleYear)))
  1. This code creates a new data frame, ext_time_with_counts, by merging (left_join) the ext_time data frame with the unique_coral_counts data frame. The join is performed using the SampleYear column, which is common to both data frames.
ext_time_with_counts <- ext_time %>%
  left_join(unique_coral_counts, by = "SampleYear")
  1. This code processes the ext_time_with_counts data frame to summarize sponge coverage by year. It groups the data by SampleYear and calculates the total extent of sponge coverage (total_extent), retrieves the total unique coral count for each year (count_observations), and computes the weighted average extent (weighted_avg_extent) by dividing the total extent by the unique coral count.
weighted_summary <- ext_time_with_counts %>%
  group_by(SampleYear) %>%
  summarise(
    total_extent = sum(Extent, na.rm = TRUE),
    count_observations = first(total_unique_corals),  # Use `first` to avoid altering the count
    weighted_avg_extent = total_extent / count_observations
  )
  1. This code creates a line plot using the weighted_summary data frame to visualize the weighted average extent of Cliona sponge coverage over time. It maps SampleYear to the x-axis and weighted_avg_extent to the y-axis, adding a blue line and points to highlight the data, along with a minimal theme and descriptive labels for the title, x-axis, and y-axis.
ggplot(weighted_summary, aes(x = SampleYear, y = weighted_avg_extent)) +
  geom_line(color = "blue") +
  geom_point(size = 2) +
  labs(
    title = "Weighted Average Cliona Sponge Coverage Over Time",
    x = "Year",
    y = "Weighted Average Extent of Coverage"
  ) +
  theme_minimal()

Next I ask the question: How does the extent of coverage differ between coral species each year? Are there species that experience fluctuating or consistent coverage levels over time?

  1. This code creates a summary data frame, spp_summary, by grouping the ext_time_with_counts data frame by SampleYear and SPP (coral species). For each group, it calculates the total sponge coverage extent (total_extent), retrieves the total unique coral count for the year (count_observations), and computes the weighted average sponge extent (weighted_avg_extent) by dividing the total extent by the unique coral count. This provides a standardized metric of sponge coverage for each coral species across years.
spp_summary <- ext_time_with_counts %>%
  group_by(SampleYear, SPP) %>%
  summarise(
    total_extent = sum(Extent, na.rm = TRUE),
    count_observations = first(total_unique_corals),
    weighted_avg_extent = total_extent / count_observations
  )
  1. This code creates a line plot using the spp_summary data frame to visualize the weighted average extent of Cliona sponge coverage over time for each coral species (SPP). The x-axis represents SampleYear, the y-axis represents weighted_avg_extent, and different coral species are distinguished by color.
ggplot(spp_summary, aes(x = SampleYear, y = weighted_avg_extent, color = SPP)) +
  geom_line() +
  geom_point() +
  labs(
    title = "Weighted Average Cliona Sponge Coverage by Coral Species Over Time",
    x = "Year",
    y = "Weighted Average Extent of Coverage"
  ) +
  theme_minimal()

Spencer Parr

Substrate Preference

Ivlev’s Electivity Index

The Ivlev’s Electivity Index (Ei) is a quantitative measure used to determine an organism’s preference for certain food or habitat types relative to their availability. In this thesis, it is applied to assess the Cliona delitrix sponge’s substrate preferences on coral reefs. Specifically, the index compares the proportion of coral species colonized by Cliona to the proportion of those species available in the environment. A positive Ei value indicates a preference, while negative values suggest avoidance. This helps in understanding how Cliona sponges select coral species under different environmental conditions.

Electivity indices measure the utilization of food types (r) in relation to their abundance or availability in the environment (p). Foods that constitute a larger proportion of the diet than of the available foods are considered preferred; conversely those proportionately underrepresented in the diet are avoided.

Figure 1: Prey selectivity according to Ivlev’s selectivity index of newly emerged brown trout fry in the River Iso (NW Spain). Modified from Elías, Jacinto & Díaz et al. (2013)

While Ivlev’s Electivity Index is typically used for dietary preferences, it can still be effectively applied to Cliona delitrix sponges in this thesis, despite corals not being a food source. Cliona sponges exhibit substrate preferences for certain coral species based on factors such as calcium carbonate concentration. Some stony corals have denser calcium carbonate skeletons, which may be harder for Cliona to excavate and colonize. Thus, the sponge’s choice is driven by its ability to grow and obtain necessary nutrients, similar to dietary selectivity in food webs.

  1. Coral Abundance by Species and Location:
  • The code first calculates how many times each coral species appears at each location (site). This gives a general sense of coral distribution across different locations, providing a simple count of occurrences per species at each site.
  1. Coral Abundance by Species, Location, Transect, and Year:
  • The second part refines the calculation to be more detailed by including both transects (smaller sampling areas within locations) and sample years. This breakdown gives the number of occurrences for each species not only by location but also by transect and year, making it possible to track coral abundance more precisely across both space and time.
# Calculate coral abundance by transect and year
coral_abundance <- cliona %>%
  group_by(SPP, Location, Transect, SampleYear) %>%
  summarise(Abundance = n()) %>%
  ungroup()

clionaclean1 <- cliona %>%
  left_join(coral_abundance, by = c("SPP", "Location", "Transect", "SampleYear"))
  1. Summarizing Total Presence and Abundance:
  • The code calculates two key metrics for each coral species (SPP). The code creates a summary of each coral species, providing an overall count of how often each species was present and its total abundance across the entire dataset. This allows for a better understanding of the relative presence and prevalence of different coral species in the study area.
# Summarize data to calculate total presence and abundance of each coral species
summary_cliona <- clionaclean1 %>%
  group_by(SPP) %>%
  summarise(TotalPresence = sum(presence), 
            TotalAbundance = sum(Abundance)) %>%  # Adjust for coral abundance
  ungroup()
  • This code calculates the proportions of presence and availability for each coral species and then computes Ivlev’s Electivity Index to determine whether species are disproportionately present relative to their availability in the environment.
# Calculate proportions and then Ivlev's Electivity Index adjusted for coral abundance
 Ivlev_Index <- summary_cliona %>%
   mutate(ProportionPresence = TotalPresence / sum(TotalPresence),
         ProportionAvailability = TotalAbundance / sum(TotalAbundance),  # Use abundance
          IvlevIndex = (ProportionPresence - ProportionAvailability) /
            (ProportionPresence + ProportionAvailability))

Ivlev_Index %>%
  kable(caption = "Ivlev's Electivity Index Adjusted for Coral Abundance", digits = 4)
Ivlev’s Electivity Index Adjusted for Coral Abundance
SPP TotalPresence TotalAbundance ProportionPresence ProportionAvailability IvlevIndex
AA 72 41794 0.0378 0.0618 -0.2408
AC 0 21 0.0000 0.0000 -1.0000
AF 0 229 0.0000 0.0003 -1.0000
AG 9 1142 0.0047 0.0017 0.4736
AGSP 3 17256 0.0016 0.0255 -0.8837
AH 9 10869 0.0047 0.0161 -0.5455
AL 29 9780 0.0152 0.0145 0.0259
AP 0 4 0.0000 0.0000 -1.0000
AT 0 13 0.0000 0.0000 -1.0000
AU 0 41 0.0000 0.0001 -1.0000
CN 7 403 0.0037 0.0006 0.7210
DCY 0 60 0.0000 0.0001 -1.0000
DL 7 365 0.0037 0.0005 0.7440
DSO 1 58 0.0005 0.0001 0.7193
EF 0 232 0.0000 0.0003 -1.0000
FF 0 17 0.0000 0.0000 -1.0000
HC 0 1355 0.0000 0.0020 -1.0000
IR 0 10 0.0000 0.0000 -1.0000
IS 0 4 0.0000 0.0000 -1.0000
MAFO 0 62 0.0000 0.0001 -1.0000
MAL 0 15 0.0000 0.0000 -1.0000
MAN 0 6 0.0000 0.0000 -1.0000
MAR 0 25 0.0000 0.0000 -1.0000
MC 87 7315 0.0457 0.0108 0.6172
MD 11 2743 0.0058 0.0041 0.1751
MDA 0 3 0.0000 0.0000 -1.0000
MDSP 0 33 0.0000 0.0000 -1.0000
MF 1 24 0.0005 0.0000 0.8734
MILA 8 4438 0.0042 0.0066 -0.2193
MILC 0 702 0.0000 0.0010 -1.0000
MILSP 0 4 0.0000 0.0000 -1.0000
ML 0 9 0.0000 0.0000 -1.0000
MM 0 317 0.0000 0.0005 -1.0000
MME 0 793 0.0000 0.0012 -1.0000
MP 0 51 0.0000 0.0001 -1.0000
MYSP 0 76 0.0000 0.0001 -1.0000
OA 396 26709 0.2080 0.0395 0.6808
OFAV 81 5270 0.0425 0.0078 0.6904
OFRA 166 78250 0.0872 0.1157 -0.1406
OX 107 38927 0.0562 0.0576 -0.0120
PA 304 224707 0.1597 0.3323 -0.3509
PB 0 1 0.0000 0.0000 -1.0000
PBSP 2 1460 0.0011 0.0022 -0.3454
PC 0 116 0.0000 0.0002 -1.0000
PCL 4 91 0.0021 0.0001 0.8796
PD 0 413 0.0000 0.0006 -1.0000
PF 1 435 0.0005 0.0006 -0.1010
PP 52 107932 0.0273 0.1596 -0.7078
PS 26 1128 0.0137 0.0017 0.7823
SB 0 8 0.0000 0.0000 -1.0000
SC 0 48 0.0000 0.0001 -1.0000
SCSP 0 53 0.0000 0.0001 -1.0000
SI 76 3601 0.0399 0.0053 0.7646
SL 0 1 0.0000 0.0000 -1.0000
SR 1 5785 0.0005 0.0086 -0.8843
SS 444 81086 0.2332 0.1199 0.3209
SSPP 0 1 0.0000 0.0000 -1.0000
TC 0 1 0.0000 0.0000 -1.0000
UNK 0 1 0.0000 0.0000 -1.0000
  1. The resulting graph shows a clear separation at zero, with bars above zero (blue) indicating a preference for certain coral species and bars below zero (red) indicating avoidance. This pattern reveals that Cliona generally avoids many coral species, with only a few species showing a positive electivity index, suggesting that Cliona selectively colonizes specific corals. This insight helps identify coral species that are more vulnerable to Cliona colonization, which is important for managing coral health and understanding Cliona’s ecological impact on reef systems.
ggplot(Ivlev_Index, aes(x = reorder(SPP, IvlevIndex), y = IvlevIndex, fill = IvlevIndex > 0)) +
  geom_bar(stat = "identity", width = 0.8) +
  labs(title = "Ivlev's Electivity Index for Cliona on Different Coral Species",
       x = "Coral Species",
       y = "Ivlev's Electivity Index") +
  theme_minimal(base_size = 14) +
  scale_fill_manual(values = c("red", "steelblue"), 
                    name = "Preference",
                    labels = c("Avoidance", "Preference")) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 10),  # Adjust angle and font size
        plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),  # Center and bold title
        legend.position = "top") +  # Move legend to the top
  geom_hline(yintercept = 0, linetype = "dashed", color = "black", size = 0.5)  # Add horizontal line at 0

The bar graph shows Ivlev’s Electivity Index for Cliona sponge prevalence on various coral species, quantifying the degree of preference (positive values) or avoidance (negative values) exhibited by Cliona. Coral species are displayed along the x-axis, with the y-axis representing the electivity index, ranging from -1 (complete avoidance) to 1 (complete preference). Most coral species show a strong avoidance by Cliona, evidenced by a majority of red bars with negative values. Notably, species such as MFCL, PS, and DL exhibit the highest positive electivity indices, indicating a clear preference for colonization. This suggests that these coral species may provide favorable conditions for Cliona, such as structural complexity or susceptibility to colonization. Conversely, species such as AC, AF, and AT demonstrate the most significant avoidance, with values near -1. This information suggests that there is selective colonization behavior of Cliona, which may have important implications for understanding its ecological interactions and impacts on coral reef ecosystems.

Spencer Parr

Site Map

Map 1: Spatial distribution of Cliona sponge prevalence across TCRMP monitoring sites in the U.S. Virgin Islands. Point size represents the mean prevalence percentage of Cliona sponges at each site, with larger points indicating higher prevalence. Colors depict reef complex classifications: nearshore (light green), offshore (medium green), and mesophotic (dark green). This visualization highlights variation in Cliona prevalence and reef complex types across the region.
Map 1: Spatial distribution of Cliona sponge prevalence across TCRMP monitoring sites in the U.S. Virgin Islands. Point size represents the mean prevalence percentage of Cliona sponges at each site, with larger points indicating higher prevalence. Colors depict reef complex classifications: nearshore (light green), offshore (medium green), and mesophotic (dark green). This visualization highlights variation in Cliona prevalence and reef complex types across the region.